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For the Women’s Health Initiative Investigators 


Strategies for estrogen receptor (ER)-positive breast cancer risk reduction in postmenopausal women 
require screening of large populations to identify those with potential benefit. We evaluated and attempted 
to improve the performance of the Breast Cancer Risk Assessment Tool (i.e., the Gail model) for estimat- 


In The Women’s Health Initiative cohort, breast cancer risk estimates from the Gail model and models 
incorporating additional or fewer risk factors and 5-year incidence of ER-positive and ER-negative invasive 
breast cancers were determined and compared by use of receiver operating characteristics and area under 


Among 147916 eligible women, 3236 were diagnosed with invasive breast cancer. The overall AUC for the 
Gail model was 0.58 (95% confidence interval [Cl] = 0.56 to 0.60). The Gail model underestimated 5-year inva- 
sive breast cancer incidence by approximately 20% (P<.001), mostly among those with a low estimated risk. 
Discriminatory performance was better for the risk of ER-positive cancer (AUC = 0.60, 95% Cl = 0.58 to 0.62) 
than for the risk of ER-negative cancer (AUC = 0.50, 95% CI = 0.45 to 0.54). Age and age at menopause were 
statistically significantly associated with ER-positive but not ER-negative cancers (P = .05 and P = .04 for het- 
erogeneity, respectively). For ER-positive cancers, no additional risk factors substantially improved the Gail 
model prediction. However, a simpler model that included only age, breast cancer in first-degree relatives, 
and previous breast biopsy examination performed similarly for ER-positive breast cancer prediction (AUC = 
0.58, 95% Cl = 0.56 to 0.60); postmenopausal women who were 55 years or older with either a previous breast 
biopsy examination or a family history of breast cancer had a 5-year breast cancer risk of 1.8% or higher. 


Background 
ing invasive breast cancer risk by receptor status in postmenopausal women. 
Methods 
the curve (AUC) statistics. All statistical tests were two-sided. 
Results 
Conclusions 


In postmenopausal women, the Gail model identified populations at increased risk for ER-positive but not 
ER-negative breast cancers. A model with fewer variables appears to provide a simpler approach for 


screening for breast cancer risk. 
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The Breast Cancer Risk Assessment Tool (i.e., the Gail model) is 
used to predict risk of invasive breast cancer (including both estro- 
gen receptor [ER]-positive and ER-negative disease) in women 35 
years of age or older (1,2). However, strategies to reduce breast 
cancer risk, including use of tamoxifen, raloxifene, and aromatase 
inhibitors, influence almost exclusively ER-positive disease, with 
the use of raloxifene and aromatase inhibitors limited to post- 
menopausal women (3,4). In addition, although many women 
could potentially benefit (5), risk—-benefit considerations indicate 
that a large number of postmenopausal women must be screened 
to identify a population who gain net benefit from tamoxifen use 
(6,7). As a result, methods to rapidly identify a population of post- 
menopausal women at increased risk of ER-positive breast cancer 
are needed. 

We therefore examined models of breast cancer risk in partici- 
pants of the Women’s Health Initiative (WHI). Our objective 
was to facilitate the identification of postmenopausal women at 
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increased risk for ER-positive invasive breast cancer as candidates 
for potential risk reduction interventions. 
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CONTEXT AND CAVEATS 


Prior knowledge 

Because many postmenopausal women must be screened to iden- 
tify a population who will benefit from tamoxifen treatment for 
breast cancer risk reduction, methods to rapidly identify such a 
population are needed. 


Study design 

Data from the observational study and the clinical trial cohorts of 
the Women’s Health Initiative were used. Four prediction models 
were investigated, including the Gail model (tested in the clinical 
trial cohort) and three logistic regression models (trained on the 
observational study cohort and tested on the clinical trial cohort). 


Contribution 

A model with only three risk factors—age, breast cancer in first- 
degree relatives, and previous breast biopsy examination— 
performed nearly as well as the Gail model for the prediction of 
estrogen receptor (ER)-positive breast cancer. 


Implications 

The new model with fewer variables than the Gail model may be 
as effective at identifying women at high risk for ER-positive breast 
cancer who would benefit from risk reduction interventions. 


Limitations 

Information on atypical hyperplasia, reproductive hormone levels, 
mammogram breast density, and bone mineral density, all risk fac- 
tors for breast cancer, were not available. The Gail model was the 
only model evaluated; other models are in clinical use. 





Participants and Methods 


Study Population 
The WHI is a large multicomponent study designed to test three 
chronic disease risk reduction strategies and to examine risk factors 
for these conditions in postmenopausal women. Details of the 
implementation of both the observational study with 93676 
participants and the four randomized clinical trials with 68 132 
participants that evaluated menopausal hormone therapy, a low-fat 
dietary intervention, and calcium—vitamin D supplementation 
have been published (8). Briefly, women were recruited at 40 clini- 
cal centers in the United States largely through direct mailings (9). 
Postmenopausal women, who were aged 50-79 years and unlikely 
to move or die within 3 years, were eligible. Each randomized trial 
had additional eligibility requirements related to safety and the 
intervention under test. Potential participants who were not eligi- 
ble or interested in the randomized trials entered the observational 
study. All clinical trials excluded women with a history of breast 
cancer and required that the baseline mammogram and clinical 
breast examination not be suspicious for breast cancer. Breast 
screening was not required for enrollment in the observational 
study enrollment. For these analyses, women with previous inva- 
sive breast cancer, previous noninvasive breast cancer, previous 
mastectomy, or less than 5 years follow-up were excluded, leaving 
147916 women who met all eligibility criteria. 

All participants provided written informed consent. Human 
subjects committee approval at each participating institution was 
provided. 
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Data Collection 

Participants provided data on demographics; medical, reproductive, 
and family medical histories; and lifestyle factors, such as smoking 
and alcohol use, and physical activity (10). Menopausal hormone 
therapy use (i.e., use of estrogen alone or combined estrogen plus 
progestin) was ascertained through an interviewer-administered 
questionnaire. 

The Gail model variables include age, ethnicity, age at men- 
arche, age of the mother at the birth of her first live child, number 
of first-degree relatives with breast cancer (0, 1, or >1), number 
of previous breast biopsy examinations (0, 1, or >1), and the pres- 
ence or absence of atypical hyperplasia in the biopsy specimen 
(http://brea.nci.nih.gov/brc/questions.htm). Calculations of 5-year 
risk estimates from the modified Gail model for women in this 
report were made by the National Surgical Adjuvant Breast and 
Bowel Project statistical center by following their usual coding 
procedures on data from individual WHI participants, courtesy of 
Dr Joseph Costantino. Because historical information on atypical 
hyperplasia was not collected in the WHI, all women with previ- 
ous breast biopsy examinations are coded as “unknown” for this 
last variable. 


Follow-up and Breast Cancer Ascertainment 

Breast cancer incidence and mammography use were updated 
annually (in the observational study) or semiannually (in the clinical 
trial) by mail or telephone questionnaires. Self-reported breast 
cancers were verified by centrally trained WHI physician adjudica- 
tors who reviewed pathology reports (11). Final adjudication and 
coding were performed at the WHI Clinical Coordinating Center 
by use of the Surveillance, Epidemiology, and End Results Program 
(12). Only the 3263 invasive breast cancers diagnosed within 5 years 
of enrollment and confirmed by central review were included as 
events, 713 in situ breast cancers were excluded from analyses, and 
144680 women with no invasive breast cancer were coded as con- 
trol subjects. The 363 case patients with missing or borderline 
information on ER status were excluded from subgroup analyses. 


Statistical Analyses 

Risk of invasive breast cancer was initially assessed with the Gail 
model and coded as a four-level nominal variable with categories 
of no invasive breast cancer or invasive breast cancer that was con- 
sidered as ER-positive and progesterone receptor (PR)-positive 
tumors, ER-positive and PR-negative tumors, or ER-negative 
tumors. There were too few ER-negative invasive cancers to subdi- 
vide this group by PR status. After initial receiver operating charac- 
teristic (ROC) analyses, the two ER-positive invasive breast cancer 
categories were combined resulting in three final nominal variables 
of no invasive breast cancer, ER-positive invasive breast cancer, and 
ER-negative invasive breast cancer. 

To explore whether other models could predict ER-positive 
and ER-negative breast cancer with similar or improved accuracy, 
the data were divided into the distinct observational study and 
clinical trial cohorts. The prediction models that we investigated 
were the Gail model, which was tested in the clinical trial cohort, 
and three logistic regression models, which were trained on the 
observational study cohort and tested on the clinical trial cohort. 
The first logistic regression model included the Gail model risk 
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Table 1. Baseline characteristics and breast cancer diagnosed within 5 years by cohort* 





Observational study cohort 


enrolled (n = 83348) 





Randomized clinical trial 
cohort (n = 64568) 

















Characteristic No. % No. % 
No. of participants 83348 100 64568 100 
Invasive breast cancer within 5 years of baseline 
No invasive breast cancer 81384 98 63296 98 
ER-positive tumor 1489 2 923 1 
ER-negative tumor 266 <1 195 <1 
Borderline/unknown/missing 209 <1 154 <1 
Age group at screening, y 
50-59 27227 33 22784 35 
60-69 36824 44 29817 46 
70-79 19297 23 11967 19 
Ethnicity 
White 69835 84 52800 82 
Black 6510 8 6558 10 
Hispanic 3097 4 2635 4 
American Indian 359 <1 274 <1 
Asian/Pacific Islander 2399 3 1434 2 
Unknown 1148 1 867 1 
Age at menarche, y 
<12 8248 22 14080 22 
12-13 45 885 55 35329 55 
>14 18861 23 14939 23 
Age at menopause, y 
<45 7624 22 13751 23 
45-54 51234 64 37626 63 
>54 11158 14 7963 13 
At least one first-degree relative with breast cancer 
No 65841 85 52020 86 
Yes 1816 15 8333 14 
No. of previous breast biopsy 
0 63369 78 46 306 80 
1 12725 16 8444 15 
>i 5550 7 3207 6 
Parity 
Never pregnant/never had term pregnancy 10432 13 6843 11 
1 child 7469 9 5363 8 
2 children 21853 26 15049 23 
>3 children 43027 52 37003 58 
Age at birth of first child, y 
<20 19817 26 16298 28 
20-29 49191 65 37958 65 
>30 6312 8 4497 8 
Cumulative of breastfeeding time 
Never 40276 49 30 566 48 
<ly 30448 37 23804 37 
>1 y 11368 14 9411 15 
Smoking 
Never 42200 51 32862 51 
Past 35023 43 26074 41 
Current 4980 6 4920 8 
Alcohol, No. of drinks per day 
<1 72720 87 57664 90 
>1 10493 13 6702 10 
Body mass index 
Normal (<25.0 kg/m?) 33942 41 17688 28 
Overweight (25 to <30 kg/m?) 28045 34 22997 36 
Obese (>30 kg/m?) 20397 25 23564 37 
Physical activity, METs 
Inactive 10988 13 11030 19 
<5 15545 19 13901 24 
5-12 19479 24 14193 24 
>12 36396 44 19284 33 





(Table continues) 
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Table 1 (continued). 





Observational study cohort 


Randomized clinical trial 


enrolled (n = 83348) cohort (n = 64568) 











Characteristic No. % No. % 
Length of unopposed estrogen use by category 
None 51782 62 42906 66 
<5 y 10758 13 8966 14 
5to<10y 6343 8 4214 7 
10 to <15 y 4984 6 3281 5 
215 y 9480 11 5200 8 
Duration of estrogen + progestin use by category 
None 58765 71 49547 77 
<5 y 11881 14 8151 13 
5to<10y 6790 8 3979 6 
10 to <15 y 3973 5 1991 3 
215 y 1938 2 899 1 





* 


METs = metabolic equivalents. 


factors; the second model added parity, breastfeeding, smoking, 
alcohol, body mass index, physical activity, duration of previous 
estrogen-alone use, and duration of previous estrogen plus proges- 
tin use. The third, simpler model used only a subset of modified 
Gail model risk factors (age, number of first-degree relatives with 
breast cancer [coded as 0 or >1], and number of previous breast 
biopsy examinations [coded as 0, 1, or >1]). 

The discriminatory accuracy of each model was assessed and 
compared by use of the ROC curve (R version 2.3 and R library 
ROCR, R Development Core Team, http://www.R-project.org) 
and the corresponding area under the curve (AUC). For models 
developed in the observational study cohort, the ROC and AUC 
were obtained by applying the models to the independent clinical 
trial cohort. ROC curves plot the true-positive rate (sensitivity) 
versus the false-positive rate (1 — specificity) at a continuum of 
thresholds; a participant is predicted to have breast cancer if her 
estimated probability of breast cancer exceeds a particular thresh- 
old. An ROC curve that corresponds to a fair-coin toss classifier 
(i.e., a nonpredictive model) is a straight line connecting the coor- 
dinates (0,0) to (1,1) and has an AUC of 0.50. An ROC curve that 
corresponds to a perfect classifier is a pair of vertical and horizontal 
lines connecting the coordinates (0,0) to (0,1) to (1,1) and has an 
AUC of 1.00. 

To aid in our understanding of the ROC analysis, a single mul- 
tinomial logistic regression model (in SAS PROC LOGISTIC 
version 9.1; SAS Institute, Cary, NC) was used in the pooled clini- 
cal trial and observational study cohorts to examine whether risk 
factors were associated with ER-positive and ER-negative invasive 
breast cancer, separately and combined, and whether the associa- 
tions differed by receptor status. Specifically, the estimated coeffi- 
cients of the risk factors were allowed to vary by ER status. This 
model included the Gail model risk factors and the additional risk 
factors described above; to increase power, family history of breast 
cancer was recoded as 0 or 1 or more. All risk factors were included 
regardless of statistical significance. Odds ratios (ORs), confidence 
intervals (CIs), and P values for tests of main effects and for tests 
of heterogeneity between tumor types were based on Wald statis- 
tics. All statistical tests were two-sided. 
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Due to missing covariate information, not all levels of a particular categorical variable sum to the total number of participants. ER = estrogen receptor; 


To estimate absolute risks and prevent downward bias, missing 
receptor status was dealt with by a multiple imputation model. 
Because the main goal of this particular analysis is estimation, data 
from the WHI observational study and clinical trials were com- 
bined. Calibration was verified by use of the Hosmer and 
Lemeshow goodness of fit test (13). 


Results 


Women in both the clinical trial and observational study cohorts 
were ethnically diverse and had a mean age of 63 years (range = 50- 
79 years), with 21% being 70 years or older (Table 1). Serial mam- 
mography was common, with an average of 0.7 mammogram per 
year in both cohorts. Given the large sample size, tests for statistical 
significance between cohorts were highly statistically significant for 
nearly all characteristics. There appeared to be substantive differ- 
ences between the cohorts in age, ethnicity or race, body mass 
index, family history, previous benign breast biopsy examination, 
and use of hormone therapy. Among 147916 women eligible for 
these analyses, 3236 developed invasive breast cancer within 5 
years. ER status was borderline or missing for 363 breast cancer 
patients, and 2412 ER-positive and 461 ER-negative invasive breast 
cancers were diagnosed. 

The ROC curves for predicting invasive breast cancer by use of 
the Gail model 5-year risk probabilities for the 64 568 women in 
the WHI clinical trial produced an AUC of 0.58 (95% CI = 0.56 
to 0.60). The AUC provided reasonable prediction of ER-positive 
and PR-positive tumors and ER-positive and PR-negative tumors 
(AUC = 0.60 for both). Given the similar ability to predict ER- 
positive and PR-positive tumors and ER-positive and PR-negative 
tumors, these categories were combined in subsequent modeling. 
For all ER-positive tumors, the usual Gail model threshold of 
1.67%, used for defining increased risk, corresponds to approxi- 
mately 50% sensitivity and 65% specificity. For ER-negative 
tumors, the Gail model was comparable to a random process 
(AUC = 0.50, 95% CI = 0.45 to 0.54) (Fig. 1). 

The accuracy of the Gail model estimated probabilities (cali- 
bration) for estimating the incidence of invasive breast cancer was 
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assessed by comparing observed cases of breast cancer in the 
cohort with the number predicted by the Gail model (as the sums 
of the individual estimated Gail model probabilities of breast can- 
cer). The Gail model 5-year risk estimate statistically significantly 
underestimated the number of breast cancers diagnosed within 5 
years by approximately 20% (3236 observed versus 2562 expected, 
P<.001) with the disparity concentrated among those with lower 
Gail model risk estimates (<1.37% expected 5-year risk). The total 
Gail model estimate was more closely aligned with the number of 
ER-positive breast cancers observed (Table 2). 

To better understand this difference in predictive performance 
between estimated breast cancer incidence from the Gail model and 
the observed incidence in the WHI cohort, we examined breast 
cancer risk factors by hormone receptor status (ER-positive tumors, 
ER-negative tumors, and cancer-free for 5 years) (Table 3). In 
multinomial logistic regression analyses, the 5-year probability of 
ER-positive breast cancer in the entire cohort was statistically sig- 
nificantly associated with age, ethnicity, family history of breast 
cancer, number of previous breast biopsy examinations, age at 
menopause, parity, age at first birth, smoking status, alcohol use, 
and body mass index. African American women were at statistically 
significantly lower risk of ER-positive disease than white women 
(OR = 0.68, 95% CI = 0.54 to 0.85). 

Prior breast biopsy examination and body mass index were the 
only risk factors that were statistically significantly associated with 
ER-negative disease. The smaller number of ER-negative tumors 
may have precluded detection of some associations; however, the 
data indicate a different pattern of association for most factors with 
ER-positive disease than with ER-negative disease. For ER-nega- 
tive cancers, odds ratios for age, family history, age at menopause, 
greater parity, smoking, and alcohol use were all close to 1.0. The 
association between breast cancer risk and chronologic age (P = 
.05), race or ethnicity (P = .01), and age at menopause (P = .04) 
differed statistically significantly across ER disease subtypes (Table 
1). Age at menarche was not statistically significantly associated 
with either ER-positive or ER-negative breast cancer. 

To determine whether other models could improve Gail 
model performance for prediction of ER-positive breast cancer, 
the effects of the Gail model risk factors were reestimated in a 
logistic regression model that used the WHI observational study 
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Fig. 1. Receiver operating characteristic analysis and corresponding 
area under the curve (AUC) statistics for Gail model of prediction of 
invasive breast cancer risk by receptor status evaluated on the 
Women’s Health Initiative clinical trial cohort. ER = estrogen receptor; 
PR = progesterone receptor; Cl = confidence interval. 


cohort as a training set, and the resulting model was applied to 
WHI clinical trial cohort as a test set. This approach differed 
from the original approach in that it did not explicitly account for 
competing risks. However, only 5.3% of women in the observa- 
tional study and 4.7% of women in the clinical trial died or were 
lost to follow-up within 5 years of enrollment. Expanding this 
model to add parity, breastfeeding, smoking, alcohol, body mass 
index, physical activity, duration of previous estrogen-alone use, 
and duration of previous estrogen plus progestin use produced 
ROC curves in a similar pattern with only slight improvements in 
the AUC statistics (Fig. 2). By focusing on only ER-positive 
tumors, we simplified the model to include only age, first-degree 
relatives with breast cancer (coded as 0 or >1), and number of 
previous breast biopsy examinations (coded as 0, 1, or >1) and we 


Table 2. Comparison of observed number of invasive breast cancers with expected number from the Gail model of the clinical trial and 


observational study cohorts* 





Expected invasive 
breast cancers from 
the Gail model 





Observed invasive breast cancers in the entire cohort 








Gail model 

risk quintile, % Total No. Total No. Gail expected, % ER+ tumor, No. ER- tumor, No. 
<1.09 251 399 62.9 254 88 
(1.09, 1.37] 377 568 66.3 411 85 
(1.37, 1.68] 440 593 74.2 455 82 
(1.68, 2.16] 559 733 81.7 566 95 
>2.16 935 943 99.1 726 111 

Total 2562t 3236 79.2 2412 461 





* ER=estrogen receptor. 


t The difference between the Gail model expected versus observed number of invasive breast cancers was statistically significantly different (P<.001). These data 
were based on a chi-square goodness of fit test with 5 df. The model was based on data from eligible case patients with invasive breast cancer within 5 years of 
baseline in the clinical trial and observational study cohorts from the Women’s Health Initiative. 
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Table 3. Baseline characteristics and multivariable odds ratios* (95% confidence intervals) of invasive breast cancer cases (within 
5 years of baseline) by tumor type of the clinical trial and observational study cohorts of the Women’s Health Initiativet 





No invasive 















































breast Patients with ER-positive Patients with ER-negative 
cancer tumors tumors 
Characteristic No. % No. % OR (95% Cl) Pvaluet No. % OR (95% Cl) P8 P overall|| Phomof 
Age group at screening, y <.001 98 <.001 .05 
50-59 49124 34 630 26 1.00 (referent) 156 34 .00 (referent) 
60-69 65104 45 1147 48 1.28 (1.14 to 1.44 213 46 .01 (0.78 to 1.31) 
70-79 30452 21 635 26 1.53 (1.33 to 1.76 92 20 0.98 (0.71 to 1.37) 
Race or ethnicity .002 32 .006 01 
White 119815 83 2142 89 1.00 (referent) 369 80 .00 (referent) 
Black 12862 9 114 5 0.68 (0.54 to 0.85 59 13 .41 (0.96 to 2.06) 
Hispanic 5641 4 60 2 0.74 (0.54 to 1.03 17 4 0.90 (0.46 to 1.77) 
American Indian 625 0 6 (0) 2 0 
Asian/Pacific Islander 3755 3 62 3 1,02 (0.77 to 1.35. 11 2 .17 (0.62 to 2.21) 
Unknown 1982 1 28 1 3 1 
No. of first-degree <.001 44 <.001 12 
relatives with 
breast cancer 
0 115456 86 1794 79 1.00 (referent) 347 81 1.00 (referent) 
21 19533 14 464 21 1.44 (1.28 to 1.62) 80 19 1.12 (0.84 to 1.51) 
No. of previous <.001 .001 <.001 .99 
breast biopsy 
examinations 
None 107550 79 1591 69 .00 (referent) 303 70 .00 (referent) 
1 20528 15 483. 21 .50 (1.34 to 1.69) 91 21 .49 (1.14 to 1.96) 
22 8456 6 219 10 .65 (1.41 to 1.94) 39 9 .70 (1.17 to 2.48) 
Age at menarche, y wT. 39 35 83 
<12 31572 22 559 23 .10 (0.96 to 1.26) 116. 25 .20 (0.87 to 1.66) 
12-13 79464 55 1297 54 .01 (0.90 to 1.13) 248 54 .01 (0.77 to 1.34) 
>14 33091 23 540 23 .00 (referent) 95 21 .00 (referent 
Age at menopause, y <.001 .89 <.001 .04 
<45 30785 23 412 18 .00 (referent) 104 24 .00 (referent! 
45-54 86865 64 1511 66 .32 (1.16 to 1.51) 268 61 0.93 (0.70 to 1.24) 
>54 18619 14 383 17 .55 (1.30 to 1.85) 64 15 0.93 (0.61 to 1.39) 
Parity <.001 63 .003 38 
Never pregnant/no term 16841 12 346 14 .00 (referent) 45 10 .00 (referent 
pregnancy 
1 child 12560 9 209 9 0.63 (0.49 to 0.81) 31 7 0.76 (0.40 to 1.44) 
2 children 36067 25 629 26 0.71 (0.57 to 0.88) 125 27 .04 (0.62 to 1.75) 
>23 children 78368 54 203 50 0.66 (0.54 to 0.81) 256 56 .01 (0.62 to 1.65) 
Age at birth of first child, y <.001 Az <.001 17 
<20 35334 27 581 26 .00 (referent) 100 24 .00 (referent! 
20-29 85276 65 1388 63 .08 (0.93 to 1.27) 279 68 .41 (0.98 to 2.01) 
>30 10504 8 250 11 .56 (1.27 to 1.93) 32 8 .45 (0.86 to 2.45) 
Cumulative breastfeeding .66 17 .36 13 
time 
Never 69290 49 152 49 .00 (referent) 221 49 .00 (referent! 
<ly 53061 37 896 38 .05 (0.94 to 1.17) 160 35 0.81 (0.63 to 1.05) 
>1y 20335 14 325 14 .03 (0.89 to 1.20) 72 16 .05 (0.76 to 1.45) 
Smoking .001 .96 .009 .25 
Never 73524 51 1120 47 .00 (referent) 239 53 .00 (referent 
Past 59655 42 1106 47 .20 (1.09 to 1.32) 189 42 0.97 (0.77 to 1.22) 
Current 9699 7 145 6 .12 (0.91 to 1.37) 27 6 .02 (0.65 to 1.60) 
Alcohol consumption, .02 AS 0.08 .60 
drinks per day 
<1 127608 88 2050 85 .00 (referent) 408 89 .00 (referent! 
>1 16743 12 359 15 .17 (1.02 to 1.33) 50 11 .06 (0.75 to 1.49) 
Body mass index <.001 .03 <.001 .08 
Normal (<25.0 kg/m?) 50539 35 824 34 .00 (referent) 163 -36 .00 (referent 
Overweight 49958 35 838 35 .14 (1.02 to 1.27) 134 29 0.83 (0.63 to 1.09) 
(25 to <30 kg/m?) 
Obese (>30 kg/m’) 42924 30 729 30 .26 (1.12 to 1.43) 162 35 .21 (0.92 to 1.60) 





(Table continues) 
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Table 3 (continued). 





No invasive 











breast Patients with ER-positive Patients with ER-negative 
cancer tumors tumors 
Characteristic No. % No. % OR (95% Cl) Pvaluet No. % OR (95% Cl) P8 P overall|| Phomof 
Physical activity, METs 37 38 0.41 57 
Inactive (0) 21514 16 359 15 1.00 (referent) 86 20 1.00 (referent) 
<5 28821 21 473 20 0.98 (0.84 to 1.15) 86 20 0.75 (0.53 to 1.06) 
5-12 32907 24 571 25 0.97 (0.84 to 1.13) 107 24 0.81 (0.58 to 1.14) 
212 54474 40 915 39 0.90 (0.78 to 1.04) 161 37 0.78 (0.57 to 1.07) 
* From a multivariable multinomial logistic regression model containing the predictors shown in the table and also adjusted for duration of baseline estrogen-only 
use and duration of estrogen plus progestin use. 
t ER=estrogen receptor; OR =odds ratio; Cl=confidence interval; homo =homogeneity. Boldface type indicates statistically significant P values. All statistical tests 
were two-sided. 
+ From a multivariable multinomial logistic regression model, chi-square test to determine if the risk factor is predictive of ER-positive tumors. 
§ From a multivariable multinomial logistic regression model, chi-square test to determine if the risk factor is predictive of ER-negative tumors. 
|| From a multivariable multinomial logistic regression model, chi-square test to determine if risk factor is predictive of either ER-positive or ER-negative tumors. 
{1 From a multivariable multinomial logistic regression model, chi-square test to determine if odds ratios differ between ER-positive and ER-negative tumors for any 





level of risk factor. 


obtained an AUC of 0.58 (95% CI = .56 to 0.60). The difference 
between the Gail model and the simplified model at the estimated 
1.8% probability threshold was small and not statistically signifi- 
0.02, 95% CI = -0.04 to 0.08) (Fig. 3). The 
absolute risk prediction results for ER-positive invasive breast 
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Fig. 2. Receiver operating characteristic analysis and corresponding 
area under the curve (AUC) statistics for Gail model prediction of inva- 
sive breast cancer risk plus additional risk factors by receptor status. 
The model was developed on a training set by use of the observational 
study cohort and evaluated by use of the clinical trial cohort. Variables 
in the training set analyses included Gail model factors (age, ethnicity, 
number of first-degree relatives with breast cancer [0, 1, or >1], previ- 
ous breast biopsy examination [0, 1, or >1], age at menarche, and age 
at birth of first child) plus age at menopause, parity, breast feeding, 
smoking, alcohol, body mass index, physical activity, duration or previ- 
ous estrogen-alone use, and duration of previous estrogen plus proges- 
terone use. ER = estrogen receptor; Cl = confidence interval. 
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cancer from the simplified model by age increments are shown in 
Table 4. To prevent the downward bias of the predicted 5-year 
risk of ER-positive invasive breast cancers, missing receptor 
status was dealt with by multiple imputations. The imputation 
model included age, ethnicity, and age at menopause. These 
variables were chosen from Table 3 because these variables 
suggested differing risks by receptor status. 


sensitivity 
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0.58(0.56,0.60) 
0.60(0.58,0.62) 








0.2 0.4 0.6 


1-specificity 


0.8 1.0 


Fig. 3. Receiver operating characteristic analysis and corresponding 
area under the curve (AUC) statistics for Gail model prediction of estro- 
gen receptor-positive invasive breast cancer risk and for a simpler 
model with fewer risk factors that was developed on a training set by 
use of the observational study cohort and evaluated by use of the clini- 
cal trial cohort. Variables in the simpler model include age, previous 
breast biopsy (0, 1, or >1), and number of first-degree relatives with 
breast cancer (0 or >1). Error bars = 95% confidence intervals (Cls) for 
the sensitivity for a cut point of 1.8%. GM = Gail model. 


JNCI | Articles 1701 


Table 4. Predicted 5-year risk* (%) of estrogen receptor—positive 
invasive breast cancer in postmenopausal women: 
simplified model 




















Participants No biopsy 1 biopsy >1 biopsy 
No first-degree relative with 
breast cancer 
All postmenopausal women 
Age, y 
50-54 1.0 1.6 1.8 
55-59 1:3 2.0 2.3 
60-64 1.6 2.4 2.7 
65-69 1:7 2.5 2.9 
70-74 1.9 2.8 3.2 
>75 1.9 2.8 3:2 
African American 
postmenopausal women 
Age, y 
50-54 0.7 1.2 2:2 
55-59 0.6 1.0 1.8 
60-64 1.1 1:7 3.2 
65-69 12 1.9 3.6 
70-74 0.9 1.4 2.6 
275 0.8 1:3 2.3 
>1 first-degree relative with 
breast cancer 
All postmenopausal women 
Age, y 
50-54 1:5 23 2.6 
55-59 2.0 2.9 3.4 
60-64 2.3 3.4 4.0 
65-69 2.5 3-7 4.3 
70-74 2.7 4.1 4.7 
275 2.8 4.1 4.7 
African American 
postmenopausal women 
Age, y 
50-54 1.0 17 3.1 
55-59 0.8 1.4 2.6 
60-64 15 2.5 4.5 
65-69 1.7 2.7 5.0 
70-74 1.2 2.0 3.7 
275 1.1 1.8 3:3 





* Predicted 5-year risk of estrogen receptor—positive invasive breast cancer by 
age category, number of first-degree relatives with breast cancer (0 or >1), 
and number of previous breast biopsy examinations (0, 1, or >1). 


All women older than 55 years with either a previous biopsy 
examination or a first-degree relative with breast cancer had a 5-year 
risk of invasive breast cancer that was greater than 1.8%. In an 
exploratory analysis, the same process was applied to African 
American participants in the clinical trial and observational study 
cohorts. For these African American women, women aged 60 years 
or older with a previous biopsy examination and a positive family his- 
tory of breast cancer had a 5-year risk greater than 1.8% (Table 4). 


Discussion 


In a large cohort of postmenopausal women (50-79 years of age 
at entry), the discriminatory performance of the Gail model was 
similar to that observed in previous studies (14,15), but it underes- 
timated the observed 5-year invasive breast cancer incidence by 
approximately 20%. Discriminatory performance and model cali- 
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bration were somewhat better for estimating population-based risk 
of ER-positive breast cancers, but the performance of the Gail 
model for predicting ER-negative breast cancers was equivalent to 
chance alone. Incorporation of several additional risk factors pro- 
vided only a small improvement in prediction. However, a simpler 
model that incorporated fewer variables was nearly as accurate as 
the Gail model in predicting ER-positive breast cancer risk and 
would be more accessible for routine and rapid prescreening in the 
prevention or routine care setting, in which breast cancer risk is 
only one measure among the many requiring assessment. 

In our analyses, the association of putative risk factors with 
breast cancer prediction varied by ER status. For ER-positive 
breast cancers among postmenopausal women, the Gail model 
components of age, ethnicity, age of the mother at birth of her first 
live child, number of first-degree relatives with breast cancer, and 
number of prior breast biopsy examinations were statistically sig- 
nificantly associated with breast cancer risk. Age at menarche was 
not associated with risk. Also associated with the risk of an ER- 
positive tumor were age at menopause, parity, and body mass 
index. In contrast, only prior breast biopsy examination and body 
mass index were statistically significantly associated with the risk of 
ER-negative breast cancer, although the smaller number of 
patients with ER-negative breast cancer may have limited our abil- 
ity to detect some associations, particularly for race or ethnicity. 

Although the Gail model has been validated for predicting total 
breast cancer risk in several settings (14,15), including both pre- and 
postmenopausal women, the utility of the Gail model for predicting 
ER-positive compared with ER-negative breast cancers in post- 
menopausal women has not been previously recognized to our 
knowledge. In these analyses, a woman’s age, a Gail model compo- 
nent, was associated only with the risk of ER-positive breast cancer 
but not with the risk of ER-negative breast cancer. As a result, the 
Gail model appears to have the ability to differentially predict breast 
cancer risk by receptor subgroup in postmenopausal women. 

Despite its well-documented predictive performance (14,15), 
the Gail model statistically significantly underestimated breast 
cancer incidence in the WHI cohort of postmenopausal women. 
The initial four studies that validated the original Gail model 
included not only postmenopausal women but also some much 
younger premenopausal women (1,16—18), and two of these stud- 
ies (17,18) entered women no older than 54 and 61 years at entry, 
respectively. In these studies, mammography was not prespecified 
and was rarely used before the mid-1980s. Because tumors detected 
by mammography are more likely than those detected by other 
means to be ER-positive tumors than ER-negative tumors (19) and 
older women are substantially more likely than younger women to 
develop ER-positive tumors (20), it is likely that these older studies 
did not optimally detect ER-positive disease. In contrast, the WHI 
cohort included only postmenopausal women who were 50 years 
or older, with a substantial number older than 70 years, and had 
comprehensively used mammography. In addition, a secular 
change in breast biopsy procedures had occurred, beginning in the 
early 1990s, away from open surgical biopsy examination to the 
common use of image-guided percutaneous core biopsy examina- 
tions, which are associated with less morbidity and a lower thresh- 
old for use (21). As a result of these changes, the breast biopsy rate 
per 100 000 Medicare beneficiaries increased by 43% between 
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1999 and 2004 (22). These factors likely contributed to a more 
comprehensive ascertainment of all patients with breast cancer in 
the WHI cohort, especially for those with ER-positive disease, and 
likely enhanced our ability to discriminate between the influences 
on ER-positive and ER-negative disease. Future development of 
breast cancer risk models, consequently, should consider pre- 
menopausal and postmenopausal women separately and segregate 
risk by hormone receptor status. 

Other groups have examined the risk factors associated with 
breast cancer in hormone receptor subgroups (20,23-25). In a 
meta-analysis of observational studies, parity and age at first child’s 
birth were associated with ER-positive and PR-positive tumors but 
not with ER-negative and PR-negative breast cancers (22). In 
another report (19), statistically significant heterogeneity among 
the four ER and PR categories was observed for some risk factors 
but not for prior breast biopsy examinations, family history of 
breast cancer, alcohol use, and height. Direct comparison with our 
analyses was precluded by differences in study populations, breast 
cancer receptor categories, and the risk factors examined. 

Despite evaluation of multiple additional risk factors, we did 
not identify a model that would clearly improve the accuracy of 
risk prediction for ER-positive breast cancers over that of the Gail 
model. However, we did identify a simpler model with only three 
variables—age, family history of breast cancer in first-degree rela- 
tives, and previous breast biopsy examination—which had predic- 
tive accuracy approaching that of the Gail model. 

Although many women could potentially benefit from tamoxi- 
fen use, there has been reluctance to incorporate Gail model breast 
cancer risk assessment in routine clinical practice (6,26-28). In one 
survey (6), only 11% of California primary care physicians had used 
the Gail model for risk assessment in the past year. In a recent 
national survey of primary care providers (29), only 16% agreed 
that “it is easy to determine” who is eligible for breast cancer risk 
reduction strategies and only 25% had prescribed tamoxifen for risk 
reduction in the past year. As a result, development of new, simpler 
risk models is an identified research priority (30). The simplified 
model described above provides a straightforward approach to ini- 
tial screening for risk of ER-positive breast cancer in postmeno- 
pausal women and does not require computer use. Women who 
were 55 years or older with either a first-degree relative with breast 
cancer or a previous breast biopsy have 5-year risk of 1.8% or 
higher (which is higher than the Gail model threshold of 1.67%). 

In previous analyses of the WHI cohort (31), African American 
women were identified as being at substantially lower risk for ER- 
positive breast cancer but at substantially higher risk for ER-nega- 
tive, high-grade breast cancers with poor prognosis. These analyses 
suggest that African American women who were 60 years or older 
with a first-degree relative with breast cancer and a previous breast 
biopsy examination may have a 5-year breast cancer risk of 1.8% 
or higher. Given the sample size in this subgroup, additional 
studies are needed to confirm this finding. Because evaluation of 
the Gail model in multiethnic populations that include African 
American women have been preliminary (32-33) or disappointing 
(34), further model development in minority populations of 
African American and Hispanic women is a research priority. 

The demonstration that the Gail model and our proposed sim- 
pler model can predict the risk of ER-positive breast cancer has 
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several clinical implications. Although the selective estrogen 
receptor modulator (SERM) tamoxifen is approved by the Federal 
Drug Administration (FDA) for breast cancer risk reduction (35), 
a large number of postmenopausal women must be screened to 
identify potential candidates. By one calculation, only one of 142 
screened postmenopausal women who were 60-79 years has a 
favorable balance for tamoxifen use (6). The SERM raloxifene has 
recently been approved by the FDA for breast cancer risk reduc- 
tion in postmenopausal women as well and has a somewhat more 
favorable side effect profile (36). The current Gail model or our 
simpler model could identify populations of postmenopausal 
women appropriate for further consideration for breast cancer risk 
reduction interventions. For interactions with individual women in 
the clinic, the limitations of any risk assessment estimate should be 
acknowledged and the risk or benefit of any intervention should be 
carefully considered (3,26). 

Benign breast disease, a recognized breast cancer risk factor, 
may be able to integrate hormone exposures and breast tissue 
response that lead to mammographic breast alterations and in- 
dications for a biopsy examination. In the future, histologic sub- 
classification of benign breast disease (nonproliferative versus 
proliferative and the magnitude of lobular involution), which fur- 
ther differentiates breast cancer risk (37-39), may lead to more 
reliable risk estimation. 

The strengths of this study include the prospective design, a 
large racially representative population that is well characterized 
for breast cancer risk factors and has serial assessment of mam- 
mography use, central adjudication of breast cancer pathology 
reports, and information on breast cancer hormone receptor 
status. In addition, the risk assessment models evaluated in this 
report were developed in one WHI population (the observational 
study cohort) and independently tested in a separate population 
(the clinical trial cohort), in which many of the risk assessment 
procedures and breast cancer outcomes were determined similarly. 

Our study has several limitations. Information on atypical 
hyperplasia, a Gail model component, was not available in our 
cohort. However, the Gail model does allow for missing histologic 
subclassification and that was how we coded the WHI data. We 
evaluated only one model and recognize other risk prediction 
models are in clinical use (29). We selected the Gail model because 
it is the most commonly cited model, is the most frequently used 
by care providers in the United States, and is the basis for the FDA 
approved indication for use of tamoxifen for prevention. 

Another study limitation is that reproductive hormone levels, 
mammogram breast density, and bone mineral density, which are 
strongly associated with breast cancer risk (40-43), were not avail- 
able for these analyses. However, analyses incorporating these risk 
factors have provided mixed results. Although estrogen levels have 
been commonly related to breast cancer (40,43,44), in a random- 
ized trial involving women at relatively high risk of breast cancer, 
estrogen and testosterone levels were not associated with either 
subsequent breast cancer risk or risk reduction by tamoxifen (45). 
The addition of mammographic breast density has not improved 
(46) or only modestly improved (47,48) breast cancer prediction 
over the Gail model. To our knowledge, the influence of bone 
mineral density addition to Gail model prediction has not been 
reported. At present, therefore, the role of these three factors on 
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the assessment of breast cancer risk in routine clinical practice 
remains to be determined. 

In summary, we found that among postmenopausal women who 
receive regular mammographic examinations, the Gail model pre- 
dicts ER-positive breast cancer risk at the population level but does 
not predict the risk of ER-negative tumors. A model incorporating 
only age, family history of breast cancer, and previous breast biopsy 
examinations provides a simpler approach for initial identification 
of populations of postmenopausal women at elevated risk for ER- 
positive breast cancer who may benefit by further evaluation for risk 
reduction interventions. Future attempts to improve breast cancer 
risk models should consider premenopausal and postmenopausal 
women separately and segregate risk by hormone receptor status. 
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